Visualizing Trees of a Random Forest Model CPSC 547 Project Proposal

نویسنده

  • Ken Lau
چکیده

It is usually not difficult to visualize an entire classification tree in a single image without overwhelming the user with visual clutter. Interpretation becomes a problem when we are required to visualize thousands of trees at once. Random forests have become a very common data mining algorithm due to advantages of high accuracy and ability to handle large amounts of input variables. A random forest generates thousands of classification trees through bootstrap and randomized subset feature selection, and makes a prediction from the input based on the majority vote of all the trees. Interpretation becomes very difficult since it is infeasible to analyze every single tree individually. However, a single tree is easy to interpret and contains a lot of useful information. For example, the label of an interior node represents the input variable used in partitioning the feature space into binary groups that optimizes prediction of the same target value. The leaf nodes correspond to the target values with the most occurrences within the particular group. This project will focus on the construction of a visualization system that accommodates the analysis of thousands of trees generated by random forests.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Determine the most suitable Allometric equations for Estimating Above-ground Biomass of the Juniperus excelsa

  Today, modeling and determination of allometric equations of forest trees, especially Junipers trees, are very important for determination of biological status and carbon storage capacity of forest species. The aim of this study was to determine the most suitable allometric equations for estimating the biomass of leaf, sub branch, main branch, trunk, and biomass of total Juniperus excelsa tr...

متن کامل

CPSC 533C 2009F – Project Proposal PerspectiveEye: Seeing the World from Different Perspectives

Domain, Task and Dataset PerspectiveEye focuses on visualizing and understanding correlations between two sets of geospatial data of different countries over time. In particular, I am interested in looking at poverty, crime rate, mortality rate and other similar data related to humanity, which can be obtained through GapMinder [4]. For example, how are poverty and population density related in ...

متن کامل

Risks assessment of forest project implementation in spatial density changes of forest under canopy vegetation using artificial neural network modeling approach

Risks assessment of forest project implementation in spatial density changes of forest under canopy vegetation using artificial neural network modeling approach   Nowadays, environmental risk assessment has been defined as one of the effective in environmental planning and policy making. Considering the position and structure of vegetation on the forest floor, the main role of forest under ca...

متن کامل

Modeling MOOC Dropouts

In this project, we model MOOC dropouts using user activity data. We have several rounds of feature engineering and generate features like activity counts, percentage of visited course objects, and session counts to model this problem. We apply logistic regression, support vector machine, gradient boosting decision trees, AdaBoost, and random forest to this classification problem. Our best mode...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014